Athalye A, Carlini N, Wagner D. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples[J]. arXiv preprint arXiv:1802.00420, 2018.
1. Overview
In this paper
- identify three types of obfuscated gradient (shattered gradient, stochastic gradient, vanishing/exploding gradients)
- propose Backward Pass Differentiable Approximation (BPDA) to overcome obfuscated gradient
1.1. Dataset
- MNIST&CIFAR-10. untargeted
- ImageNet. 1000 randomly selected, targeted
- attacker-targeted, defender-untargeted
1.2. Network
- MNIST. 5 Conv
- CIFAR-10. ResNet
- ImageNet. InceptionV3
1.3. Attacker
- white-box (but not test-time randomness)
1.4. Obfuscated Gradient
- Shattered Gradient. non-differentiable, nonexistent, incorrect
- Stochastic Gradient. randomized
- Exploding&Vanishing Gradient. multiple iteration
2. Attack Methods
2.1. Shattered Gradient
2.1.1. Simple
- preprocessor g() satisfy g(x)āx.
2.1.2. BPDA
find a differentiable approximation g() such that
(f_i: non-differentiable layer)
- forward. through f_i(x)
- backward. replacing f_i(x) with g(x)
2.2. Stochastic Gradient
- apply Expectation over Transformation (EOT)
2.3. Exploding&Vanishing Gradient
2.3.1. Reparameterization
For f(g(x)), g performs optimation loop
- make a change-of-variable
- make a change-of-variable
find differentiable h
3. Experiments
3.1. Adversarial Training
- has been shown to be difficulty at ImageNet scale. Adversarial Machine Learning at Scale
- training exclusively on lā adversarial examples provides only limited robustness to adversarial examples under other distortion metrics. Attacking the madry de-fense model with L1-based adversarial examples
3.2. Shattered Gradient
- thermometer encoding. BPDA-backward
- cropping&rescaling. EOT
- bit-depth. BPDA-identity
- JPEG. BPDA-identity
- TVM. EOT+BPDA
- Quilting. EOT+BPDA
3.3. Stochastic Gradient
- SAP (random dropout at each layer). EOT
3.4. Vanishing&Exploding Gradient
- PixelDefend
- Defense-GAN